Fast and Scalable Analysis of Massive Social Graphs

نویسندگان

  • Xiaohan Zhao
  • Alessandra Sala
  • Haitao Zheng
  • Ben Y. Zhao
چکیده

Graph analysis is a critical component of applications such as online social networks, protein interactions in biological networks, and Internet traffic analysis. The arrival of massive graphs with hundreds of millions of nodes, e.g. social graphs, presents a unique challenge to graph analysis applications. Most of these applications rely on computing distances between node pairs, which for large graphs can take minutes to compute using traditional algorithms such as breadth-first-search (BFS). In this paper, we study ways to enable scalable graph processing on today’s massive graphs. We explore the design space of graph coordinate systems, a new approach that accurately approximates node distances in constant time by embedding graphs into coordinate spaces. We show that a hyperbolic embedding produces relatively low distortion error, and propose Rigel, a hyperbolic graph coordinate system that lends itself to efficient parallelization across a compute cluster. Rigel produces significantly more accurate results than prior systems, and is naturally parallelizable across compute clusters, allowing it to provide accurate results for graphs up to 43 million nodes. Finally, we show that Rigel’s functionality can be easily extended to locate (near-) shortest paths between node pairs. After a onetime preprocessing cost, Rigel answers node-distance queries in 10’s of microseconds, and also produces shortest path results up to 18 times faster than prior shortest-path systems with similar levels of accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Big Graph Mining: Algorithms, Anomaly Detection, and Applications

Graphs are everywhere in our lives: social networks, the World Wide Web, biological networks, and many more. The size of real-world graphs are growing at unprecedented rate, spanning millions and billions of nodes and edges. What are the patterns and anomalies in such massive graphs? How to design scalable algorithms to find them? How can we make sense of very large graphs? And what kind of rea...

متن کامل

Efficient Shortest Paths on Massive Social Graphs (Invited Paper)

Analysis of large networks is a critical component of many of today’s application environments, including online social networks, protein interactions in biological networks, and Internet traffic analysis. The arrival of massive network graphs with hundreds of millions of nodes, e.g. social graphs, presents a unique challenge to graph analysis applications. Most of these applications rely on co...

متن کامل

Exploring and Making Sense of Large Graphs

Graphs naturally represent information ranging from links between webpages, to friendships in social networks, to connections between neurons in our brains. These graphs often span billions of nodes and interactions between them. Within this deluge of interconnected data, how can we find the most important structures and summarize them? How can we efficiently visualize them? How can we detect a...

متن کامل

An Integrated, Fast and Scalable Approach for Biological Network Analysis

Analysis of biological networks has become a major challenge due to the recent development of highthroughput techniques which are rapidly producing very large datasets. A number of algorithms, techniques and applications have been proposed to infer new information from various biological networks. In practice, most of the available algorithms and related tools for exploring different properties...

متن کامل

Distributed and Scalable Graph Pattern Matching: Models and Algorithms

Graph pattern matching is a fundamental operation for many applications, and it is exhaustively studied in its classical forms. Nevertheless, there are newly emerging applications, like analyzing hyperlinks of the web graph and analyzing associations in a social network, that need to process massive graphs in a timely manner. Regarding the extremely large size of these graphs and knowledge they...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1107.5114  شماره 

صفحات  -

تاریخ انتشار 2011